2,093 research outputs found
TensorLayer: A Versatile Library for Efficient Deep Learning Development
Deep learning has enabled major advances in the fields of computer vision,
natural language processing, and multimedia among many others. Developing a
deep learning system is arduous and complex, as it involves constructing neural
network architectures, managing training/trained models, tuning optimization
process, preprocessing and organizing data, etc. TensorLayer is a versatile
Python library that aims at helping researchers and engineers efficiently
develop deep learning systems. It offers rich abstractions for neural networks,
model and data management, and parallel workflow mechanism. While boosting
efficiency, TensorLayer maintains both performance and scalability. TensorLayer
was released in September 2016 on GitHub, and has helped people from academia
and industry develop real-world applications of deep learning.Comment: ACM Multimedia 201
Move Fast and Meet Deadlines: Fine-grained Real-time Stream Processing with Cameo
Resource provisioning in multi-tenant stream processing systems faces the
dual challenges of keeping resource utilization high (without
over-provisioning), and ensuring performance isolation. In our common
production use cases, where streaming workloads have to meet latency targets
and avoid breaching service-level agreements, existing solutions are incapable
of handling the wide variability of user needs. Our framework called Cameo uses
fine-grained stream processing (inspired by actor computation models), and is
able to provide high resource utilization while meeting latency targets. Cameo
dynamically calculates and propagates priorities of events based on user
latency targets and query semantics. Experiments on Microsoft Azure show that
compared to state-of-the-art, the Cameo framework: i) reduces query latency by
2.7X in single tenant settings, ii) reduces query latency by 4.6X in
multi-tenant scenarios, and iii) weathers transient spikes of workload
OpenPARF: An Open-Source Placement and Routing Framework for Large-Scale Heterogeneous FPGAs with Deep Learning Toolkit
This paper proposes OpenPARF, an open-source placement and routing framework
for large-scale FPGA designs. OpenPARF is implemented with the deep learning
toolkit PyTorch and supports massive parallelization on GPU. The framework
proposes a novel asymmetric multi-electrostatic field system to solve FPGA
placement. It considers fine-grained routing resources inside configurable
logic blocks (CLBs) for FPGA routing and supports large-scale irregular routing
resource graphs. Experimental results on ISPD 2016 and ISPD 2017 FPGA contest
benchmarks and industrial benchmarks demonstrate that OpenPARF can achieve
0.4-12.7% improvement in routed wirelength and more than speedup in
placement. We believe that OpenPARF can pave the road for developing FPGA
physical design engines and stimulate further research on related topics
KungFu: Making Training in Distributed Machine Learning Adaptive
When using distributed machine learning (ML) systems to train models on a cluster of worker machines, users must con-figure a large number of parameters: hyper-parameters (e.g. the batch size and the learning rate) affect model convergence; system parameters (e.g. the number of workers and their communication topology) impact training performance. In current systems, adapting such parameters during training is ill-supported. Users must set system parameters at deployment time, and provide fixed adaptation schedules for hyper-parameters in the training program. We describe Kung Fu, a distributed ML library for Tensor-Flow that is designed to enable adaptive training. Kung Fu allows users to express high-level Adaptation Policies(APs)that describe how to change hyper- and system parameters during training. APs take real-time monitored metrics (e.g. signal-to-noise ratios and noise scale) as input and trigger control actions (e.g. cluster rescaling or synchronisation strategy updates). For execution, APs are translated into monitoring and control operators, which are embedded in the data flowgraph. APs exploit an efficient asynchronous collective communication layer, which ensures concurrency and consistency of monitoring and adaptation operation
- âŠ